Reinforcement Learning: Model-free

نویسنده

  • Chris R. Sims
چکیده

Simply put, reinforcement learning (RL) is a term used to indicate a large family of dierent algorithms RL that all share two key properties. First, the objective of RL is to learn appropriate behavior through trialand-error experience in a task. Second, in RL, the feedback available to the learning agent is restricted to a reward signal that indicates how well the agent is behaving, but does not indicate specifically how the reward signal agent could improve its behavior. For example, consider writing an essay for a course and receiving a numerical score in the range 0–100. If your score is less-than-perfect, you know that your performance could be improved upon, but the feedback itself doesn’t indicate specifically how your essay should have been dierent. In more complex cases, optimal behavior will generally require numerous separate decisions, and there may be delayed or missing reward signals. For example, one can imagine trying to teach a computer to play checkers by providing a positive reward signal every time it wins a game, and a negative (penalty) signal every time it loses. In this case, each individual action (moving a piece) is not rewarded, and the only reward signal is provided at the end of a game. Learning how to improve behavior given this limited type of feedback is both the goal and challenge facing all RL algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement learning based feedback control of tumor growth by limiting maximum chemo-drug dose using fuzzy logic

In this paper, a model-free reinforcement learning-based controller is designed to extract a treatment protocol because the design of a model-based controller is complex due to the highly nonlinear dynamics of cancer. The Q-learning algorithm is used to develop an optimal controller for cancer chemotherapy drug dosing. In the Q-learning algorithm, each entry of the Q-table is updated using data...

متن کامل

Operation Scheduling of MGs Based on Deep Reinforcement Learning Algorithm

: In this paper, the operation scheduling of Microgrids (MGs), including Distributed Energy Resources (DERs) and Energy Storage Systems (ESSs), is proposed using a Deep Reinforcement Learning (DRL) based approach. Due to the dynamic characteristic of the problem, it firstly is formulated as a Markov Decision Process (MDP). Next, Deep Deterministic Policy Gradient (DDPG) algorithm is presented t...

متن کامل

Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning

Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit us...

متن کامل

Bridging the Gap between Reinforcement Learning and Knowledge Representation: A Logical Off- and On-Policy Framework

Knowledge Representation is important issue in reinforcement learning. In this paper, we bridge the gap between reinforcement learning and knowledge representation, by providing a rich knowledge representation framework, based on normal logic programs with answer set semantics, that is capable of solving model-free reinforcement learning problems for more complex domains and exploits the domain...

متن کامل

Modelling Motivation as an Intrinsic Reward Signal for Reinforcement Learning Agents

Reinforcement learning agents require a learning stimulus in the form of a reward signal in order for learning to occur. Typically, this reward signal makes specific assumptions about the agent’s external environment, such as the presence of certain tasks which should be learned or the presence of a teacher to provide reward feedback. For many complex, dynamic environments, design time knowledg...

متن کامل

Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments

Efficient Bayesian Nonparametric Methods for Model-Free Reinforcement Learning in Centralized and Decentralized Sequential Environments by Miao Liu Department of Electrical and Computer Engineering Duke University

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012